Diverse Image Captioning via GroupTalk

نویسندگان

  • Zhuhao Wang
  • Fei Wu
  • Weiming Lu
  • Jun Xiao
  • Xi Li
  • Zitong Zhang
  • Yueting Zhuang
چکیده

Generally speaking, different persons tend to describe images from various aspects due to their individually subjective perception. As a result, generating the appropriate descriptions of images with both diversity and high quality is of great importance. In this paper, we propose a framework called GroupTalk to learn multiple image caption distributions simultaneously and effectively mimic the diversity of the image captions written by human beings. In particular, a novel iterative update strategy is proposed to separate training sentence samples into groups and learn their distributions at the same time. Furthermore, we introduce an efficient classifier to solve the problem brought about by the non-linear and discontinuous nature of language distributions which will impair performance. Experiments on several benchmark datasets show that GroupTalk naturally diversifies the generated captions of each image without sacrificing the accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Titles - Variations on Show, Attend and Tell

Inspired by recent advances in machine translation and object detection, we implement an image captioning pipeline, consisting of a Fully Convolutional Neural Network piping image features into an image-captioning LSTM, based on the popular Show, Attend, and Tell model. We implement the model in TensorFlow and recreate performance metrics reported in the paper. We identify and experiment with v...

متن کامل

Contrastive Learning for Image Captioning

Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learn...

متن کامل

Image Captioning with Sentiment Terms via Weakly-Supervised Sentiment Dataset

Image captioning task has become a highly competitive research area with application of convolutional and recurrent neural networks, especially with the advent of long short-term memory (LSTM) architecture. However, its primary focus has been a factual description of the images, mostly objects and their actions. While such focus has demonstrated competence, describing the images with non-factua...

متن کامل

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Image captioning has been recently gaining a lot of attention thanks to the impressive achievements shown by deep captioning architectures, which combine Convolutional Neural Networks to extract image representations, and Recurrent Neural Networks to generate the corresponding captions. At the same time, a significant research effort has been dedicated to the development of saliency prediction ...

متن کامل

Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning

Modern neural image captioning systems typically adopt the encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature extraction and a recurrent neural network (RNN) for caption generation. Inspired by the robustness analysis of CNN-based image classifiers to adversarial perturbations, we propose Show-and-Fool, a novel algorithm for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016